Measuring Limits of Fine-grained Parallelism

نویسنده

  • Andrew W. Appel
چکیده

First, the limits of low-level parallelism for a collection of real-world programs are investigated and Wall's results are veri ed. Maximum parallelism of symbolic benchmarks is investigated on both superpipelined and superscalar architectures. E ects of register renaming are considered. We then examine the e ects of garbage collection strategies on low-level parallelism. In particular, we examine whether a garbage collector should optimize for CAR or CDR accesses. We also consider another model where the CONS cells are split to make dereferencing fast, but pass the cost of accessing memory to the fetch of the address. We show that a traditional depthrst copying garbage collector would better reduce run time on a lowlevel parallel machine with speculative execution if it were to follow CDR links before CAR links. Simulations are our principal data gathering method. We have developed an instruction level MIPS R3000 simulator called mipsi. Mipsi emulates some of the kernel's functionality and thus is able to do a full simulation of a user-level process without having to simulate kernel code as well. By adding appropriate schedulers to mipsi, we can simulate very powerful parallel machines based on the MIPS chip. A step further, we can simulate aspects of the run-time system, in particular, the garbage collector, by simulating how much certain memory accesses would have cost had the garbage collector been implemented. Chapter

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Instruction Placement for an EDGE Multicore Processor Using Reinforcement Learning

Communication overheads are one of the fundamental challenges in a multiprocessor system. As the number of processors on a chip increases, communication overheads and the distribution of computation and data become increasingly significant. The granularity of communication between processors on a single chip in future systems will determine how significant these overheads will be. Fine-grained ...

متن کامل

Extended Parallelism Models For Optimization On Massively Parallel Computers

1. Abstract Single-level parallel optimization approaches, those in which either the simulation code executes in parallel or the optimization algorithm invokes multiple simultaneous single-processor analyses, have been investigated previously and have been shown to be effective in reducing the time required to compute optimal solutions. However, these approaches have clear performance limitatio...

متن کامل

A Multiprocessor Architecture Combining Fine-Grained and Coarse-Grained Parallelism Strategies

A wide variety of computer architectures have been proposed that attempt to exploit parallelism at different granularities. For example, pipelined processors and multiple instruction issue processors exploit the fine-grained parallelism available at the machine instruction level, while shared memory multiprocessors exploit the coarse-grained parallelism available at the loop level. Using a regi...

متن کامل

Effect of Adding Nanoclay on the Mechanical Behaviour of Fine-grained Soil Reinforced with Polypropylene Fibers

In this study the performance of clay nano-particles on the soil reinforced with Polypropylene fibers (PP-fiber) has been investigated. Also a series of investigations concerning the effect of random orientation of fibers on the engineering behaviour of soil were conducted. Soil mixtures were modified with varying percentages of nanoclay and Fibers. Unconfined compressive strength (UCS), Compac...

متن کامل

Balancing Fine- and Medium-Grained Parallelism in Scheduling Loops for the XIMD Architecture

This paper presents an approach to scheduling loops that leverages the distinctive architectural features of the XIMD, particularly the variable number of instruction streams and low synchronization cost. The classical VLIW and MIMD architectures have a fixed number of instruction streams, each with a fixed width. A compiler for the XIMD architecture can exploit fine-grained parallelism within ...

متن کامل

Supporting Coarse and Fine Grain Parallelism in an Extension of ML

We have built an extension of Standard ML aimed at multicomputer platforms with distributed memories. The resulting language, paraML, differs from other extensions by including and differentiating both coarse-grained and fine-grained parallelism. The basis for coarse-grained parallelism in paraML is process creation where there is no sharing of data, with communication between processes via asy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997